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The particle physics standard model (SM) is remark- 
ably successful, but is believed to require expansion be- 
yond the electroweak scale. A variety of possible exten- 
sions have been proposed. Many analyses optimized for 
specific signatures have been performed to search for evi- 
dence of these possibilities. Limits have been set on cross 
sections for postulated processes and on masses of hypo- 
thetical particles, but no conclusive indication of physics 
beyond the standard model has yet been seen []]]. 

This Letter summarizes a broad search for new physics 
at the electroweak scale without focusing on any spe- 
cific proposed scenario. The detailed writeup is pro- 
vided in Ref. Events containing one or more par- 
ticles produced at large transverse momentum collected 
by the CDF experiment in Run II of the Fermilab Teva- 
tron are analyzed for discrepancies relative to the stan- 
dard model prediction. A model-independent approach 
(Vista) considers gross features of the data, and is sen- 
sitive to new large cross section physics. A quasi-modcl- 
independent approach (Sleuth) emphasizes events with 
large summed scalar transverse momentum, and is partic- 
ularly sensitive to new electroweak scale physics. These 
global algorithms provide a complementary approach to 
searches optimized for more specific new physics scenar- 
ios. Searches in a similar spirit have previously been 



3, [I| in Tevatron 
at HERA-I. 



performed by the DO Collaboration 
Run I and by the HI Collaboration 

This search for new physics is designed with the in- 
tention of maximizing the chance for discovery, rather 
than excluding model parameter space if no discrepancy 
is found. Discrepancies between data and a complete 
standard model background estimate are identified in a 
global sample of high transverse momentum (high-py) 
collision events. Three statistics are employed to iden- 
tify and quantify disagreement: populations of exclusive 
final states defined by the objects the events contain, 
shapes of kinematic distributions, and excesses on the 
tail of summed scalar transverse momentum distribu- 
tions. These statistics identify discrepancies worthy of 
further study. 

A discovery claim can be made to the extent that a 
highlighted discrepancy can be demonstrated to be not 
due to a statistical fluctuation, a mismodeling of the de- 
tector response, or an inadequate implementation of the 
standard model prediction, and must therefore be due 
to some new underlying physics. Any observed discrep- 
ancy is subject to scrutiny, and explanations are sought 
in terms of the above points. 

The Vista and Sleuth algorithms provide a means for 
making the above three arguments, with a high threshold 
placed on the statistical significance of a discrepancy in 
order to minimize the chance of a false discovery claim. 
As described later, this threshold is the requirement that 
the false discovery rate is less than 0.001, after taking into 
account the total number of final states, distributions, or 
regions being examined. 



The traditional notions of signal and control regions 
are modified. Removing prejudice as to where new 
physics may appear, all regions of the data are treated 
as both signal and control. This analysis is not blind, 
but rather seeks to identify and understand discrepan- 
cies between data and the standard model prediction. 
With the goal of discovery, emphasis is placed on ex- 
amining discrepancies, focusing on outliers rather than 
global goodness of fit. Individual discrepancies that are 
not statistically significant are generally not pursued. 

Vista and Sleuth are employed simultaneously, 
rather than sequentially. An effect highlighted by 
Sleuth prompts additional investigation of the discrep- 
ancy, usually resulting in a specific hypothesis explaining 
the discrepancy in terms of a detector effect or adjust- 
ment to the standard model prediction that is then fed 
back and tested for global consistency using VlSTA. 

Forming hypotheses for the cause of specific discrepan- 
cies, implementing those hypotheses to assess their wider 
consequences, and testing global agreement after the im- 
plementation are emphasized as the crucial activities for 
the investigator throughout the process of data analy- 
sis [111 ]. This process is constrained by the requirement 
that all adjustments be physically motivated. 

This search for new physics terminates when one of 
two conditions are satisfied: either a compelling case for 
new physics is made, or there remain no statistically sig- 
nificant discrepancies on which a new physics case can 
be made. In the former case, to quantitatively assess 
the significance of the potential discovery, a full treat- 
ment of systematic uncertainties must be implemented. 
In the latter case, it is sufficient to demonstrate that all 
observed effects are not in significant disagreement with 
an appropriate global standard model description. 

This analysis uses data corresponding to an integrated 
luminosity of 927 pb -1 of pp collisions at ^/s = 1.96 TeV 
recorded by the CDF II detector • CDF II consists of a 
charged particle tracking system composed of silicon strip 
detectors and a gas drift chamber inside a 1.4 T mag- 
netic field, surrounded by electromagnetic and hadronic 
calorimeters and enclosed by muon detectors. 

A standard set of object identification criteria is used 
to identify isolated and energetic objects produced in 
the hard collision, including electrons (e*), muons (^), 
taus (r*), photons (7), jets (j), jets originating from a 
bottom quark (6), and missing momentum {-ft). Monte 
Carlo event generators are used to determine the stan- 
dard model prediction. VlSTA partitions data and Monte 
Carlo events into exclusive final states labeled according 
to the objects (e ± , /i*, r ± , 7, j, b, ft) identified in each 
event. Each event belongs to one and only one exclusive 
final state 

A correction model is developed to improve systematic 
deficiencies in the standard model theoretical prediction 
and the simulation of the detector response. Achieving 
this on the entire high-p^ dataset requires a framework 



TABLE I: A subset of the VlSTA comparison between Teva- 
tron Run II data and the standard model prediction, showing 
the final states with greatest discrepancies in population. Fi- 
nal states are labeled in this table according to the number 
and types of objects present, and are ordered according to 
decreasing discrepancy between the total number of events 
expected and the total number observed in the data. Only 
statistical uncertainties on the standard model prediction are 
shown; systematics are incorporated by allowing their values 
to float in the overall fit. A total of 344 populated exclusive 
final states are considered. 
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for quickly implementing and testing modifications to the 
correction model, including a quick fit for values of as- 
sociated correction factors. The specific details of the 
correction model are intentionally kept as simple as pos- 
sible in the interest of transparency in the event of a 
possible new physics claim. The details of this correction 
model are motivated by individual discrepancies noted 
in a global comparison of CDF high-p^ data to the stan- 
dard model prediction. The correction model includes 
specific correction factors for the integrated luminosity 
of the sample, the ratio (fc-factor) of the actual cross sec- 
tion for a standard model process and the usually leading 
order approximation given by event generators, object 
identification efficiencies, object misidentification rates, 
and trigger efficiencies. A total of 44 correction factors 
are used, of which over twenty are constrained by exter- 
nal information. A global x 2 is formed by comparison of 
CDF data to the standard model prediction, and mini- 
mized as a function of these correction factors. Correc- 
tions to object identification efficiencies are typically less 
than 10%; fake rates are consistent with an understand- 
ing of the underlying physical mechanisms responsible; 
A>factors range from slightly less than unity to greater 
than two for some processes with multiple jets. 

A global comparison of data to standard model pre- 
diction is made in 16,486 kinematic distributions in 344 
populated exclusive final states. In each final state, the 
number of events observed is compared with the standard 
model prediction, as shown in Table [II and the Poisson 
probability that the number of predicted events would 
fluctuate up to or above (or down to or below) the ob- 
served number of events is calculated and converted into 
units of standard deviation. In each kinematic distri- 
bution, the shape of the data is compared to the shape 
of the standard model prediction using the Kolmogorov- 
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FIG. 1: Distribution of VlSTA discrepancy between data and 
the standard model prediction, measured in units of standard 
deviation (cr), shown as the solid (green) histogram. The top 
pane shows the distribution of discrepancies between the total 
number of events observed and predicted in the 344 populated 
final states considered. The bottom pane shows the distri- 
bution of discrepancies between the observed and predicted 
shapes of 16,486 kinematic distributions. In the bottom pane, 
distributions in which data and the standard model predic- 
tion are in agreement (large KS probability) correspond to 
negative a, and distributions in which the data and the stan- 
dard model prediction are in relative disagreement (small KS 
probability) correspond to large positive a. The expected dis- 
tributions are shown as the solid (black) curves. Interest is 
focused on the entries in the tails of the top distribution and 
the high tail of the bottom distribution. 

Smirnov (KS) statistic, which is converted to a probabil- 
ity and then into units of standard deviation. 

VlSTA highlights final states and kinematic distribu- 
tions where the statistical significance of any discrepancy 
corresponds to a probability < 0.001 after accounting for 
the appropriate number of final states or distributions 
considered The algorithm itself cannot determine 

whether a particular discrepancy constitutes a discovery 
of new physics. Physics judgement is required to deter- 
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mine whether the discrepancy can be explained as a de- 
ficiency in the modeling of the CDF II detector response 
or in the calculation of the standard model prediction. 

A summary of the Vista comparison is shown in Fig. [I] 
The numbers of events observed are in agreement with 
the standard model prediction. The narrow core of the 
histogram of Vista final states (top of Fig. [T]) is due to 
final states with few data events. The excess at large a in 
the histogram of VlSTA distributions (bottom of Fig. Q]) 
shows disagreement between data and standard model 
prediction in some distributions. The number of distri- 
butions showing a significant (> 3<r after the trials factor) 
difference in shape between data and the standard model 
prediction is 384. Of these, 312 are attributed to model- 
ing the parton radiation (with 186 of these 312 pointing 
out that individual jet masses are larger in data than in 
the prediction), and 59 reflect an inadequate modeling 
of the overall transverse boost of the system ("intrinsic 
fcx"). The nature of these discrepant distributions makes 
it difficult to use them to support a new physics claim, 
since at present these discrepancies appear most proba- 
bly due to an imperfect implementation of the standard 
model prediction. Further investigation into obtaining an 
adequate QCD-based description is continuing. The re- 
maining 13 discrepant distributions arise from the coarse- 
ness of the correction model. Additional details are pro- 
vided in Ref. 2|. 

Sleuth 0, 1, H, H, [|| is simultaneously used to search 
for evidence of new physics on the high-p^ tails. Sleuth 
is a quasi-model-independent search technique, based on 
the assumption that new electroweak-scale physics will 
manifest itself as a high-p^ excess of data over the stan- 
dard model expectation in a particular final state. The 
strengths and limitations of Sleuth follow directly from 
these assumptions. 

Sleuth considers a single variable, the summed scalar 
transverse momentum (%2pt) of all objects in the event. 
The standard model prediction for the distribution of 
J2pt is determined using the correction factors found 
by Vista. For each final state, Sleuth determines the 
most interesting region on the tail of this distribution. A 
final state contains as many regions as data points, where 
the d th region is defined as the semi-infinite interval with 
lower bound equal to the d th largest data J2pt- The 
d th region contains d data events; the number of events 
expected from the standard model is obtained by inte- 
grating the predicted standard model ^2pt distribution 
over this semi-infinite region. 

For a region containing d data points, pd is defined as 
the Poisson probability that the standard model predic- 
tion would fluctuate up to or above d. The most inter- 
esting region 1Z is defined as the region for which pd is 
smallest. Pseudo experiments are performed by drawing 
pseudo data from the standard model pr distribution, 
and the most interesting region is found for each pseudo 
experiment. The fraction V of these pseudo experiments 
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FIG. 2: A test to see whether Sleuth would find evidence of 
top quark pair production, if the top quark were not known. 
Shown is the Sleuth Wbbjj final state, consisting of events 
with one electron or muon, missing transverse energy, and 
> 4 jets, at least one of which is 6-tagged. The CDF Run II 
data are shown as (black) filled circles, the standard model 
prediction (minus tt production) is shown as the (red) shaded 
stacked histogram, and the expected contribution from tt is 
shown as the dashed line. Sleuth chooses the region with 
Y2pt > 258 GeV, shown by the (blue) arrow, and displayed 
in the inset. In this region, the number of predicted stan- 
dard model events (sans tt) is SM = 17, and the number 
of observed data events is d = 110. Sleuth quantifies the 
fraction of hypothetical similar experiments that would have 
produced a region more interesting than the region chosen in 
this final state, and finds Vwbbjj < 8-3 x 10 -7 , correspond- 
ing to a value of V that easily satisfies Sleuth's discovery 
threshold of V < 0.001. 



producing a region more interesting than the region 1Z 
found in the data quantifies the interest of this final state. 

Considering all final states, Sleuth determines the 
most interesting final state in the CDF high-p*r data, and 
calculates V, the fraction of hypothetical similar CDF 
experiments that would have produced a region in any 
final state more interesting than the most interesting re- 
gion in the most interesting final state. In calculating 
V, Sleuth rigorously accounts for the number of final 
states that have been considered. With an accurate cor- 
rection model and in the absence of new physics, the 
distribution of V is uniform between zero and unity; in 
the presence of new physics, small V is expected. The 
threshold for pursuit of a possible discovery case is taken 
to be V < 0.001. 

Figure [2] shows a sensitivity test in which the standard 
model process pp — ► tt is subtracted from the standard 
model background and observed as an excess in the CDF 
data. Sleuth observes the top quark with an integrated 
Run II luminosity comparable to that accumulated by 
CDF and DO in Tevatron Run I when the top quark dis- 
covery was announced 0, [13] • Several other sensitivity 
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tests have been conducted with pseudo signal events in- 
jected into pseudo data drawn from the standard model 
prediction. On these sensitivity tests, Sleuth performs 
comparably to targeted searches for phenomena satisfy- 
ing Sleuth's basic assumptions that new physics will 
appear as an excess of data over the standard model pre- 
diction at large summed scalar transverse momentum in 
one primary final state. 

In 927 pb" 1 of CDF Run II data, Sleuth finds V = 
0.46. Assuming any deficiencies in the standard model 
implementation and detector simulation are accurately 
resolved by the correction model, the fraction of hypo- 
thetical similar CDF experiments that would observe 
something as interesting as the most interesting region 
observed in the CDF Run II data is 46%. None of the 
regions examined surpass Sleuth's discovery threshold. 
Further discussion of the most interesting regions is pro- 
vided in Ref. @. 

In conclusion, a broad search for new physics (Vista) 
has been performed in 927 pb _1 of CDF Run II data. A 
complete standard model background estimate has been 
obtained and compared with data in 344 populated exclu- 
sive final states and 16,486 relevant kinematic distribu- 
tions. Consideration of exclusive final state populations 
yields no statistically significant (> 3er) discrepancy af- 
ter the trials factor is accounted for. Quantifying the 
difference in shape of kinematic distributions using the 
Kolmogorov-Smirnov statistic, significant discrepancies 
are observed between data and standard model predic- 
tion. These discrepancies are believed to arise from mis- 
modeling of the parton shower and intrinsic kx, and rep- 
resent observables for which a QCD-based understand- 
ing is highly motivated. None of the shape discrepancies 
highlighted motivates a new physics claim. 

A further systematic search (Sleuth) for regions of 
excess on the high-^p-r tails of exclusive final states has 
been performed, representing a quasi-model-independent 
search for new electroweak scale physics. A measure of 
interest rigorously accounting for the trials factor asso- 
ciated with looking in many regions is defined, and used 
to quantify the most interesting region observed in the 
CDF Run II data. No region of excess on the high-^p-r 
tail of any of the Sleuth exclusive final states surpasses 
the discovery threshold. 

Although this global analysis of course cannot prove 
that no new physics is hiding in these data, this broad 



search of the Tevatron Run II data represents one of the 
single most encompassing tests of the particle physics 
standard model at the energy frontier. 
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